33 research outputs found

    Semantic Web Modelling: Challenges and Opportunities in Small and Large Museum Collections

    Get PDF
    Semantic Web technologies foster connection and contextualization. They can benefit museum collections by disclosing information in a scalable and interoperable way, aggregating previously heterogeneous and siloed data. Based on formal languages such as RDF, RDFS or OWL they can describe the meaning and the connections among disparate data to define concepts, entities, and relationships and to facilitate multifaceted retrieval, reasoning, data integration and knowledge reuse. Benefits of Semantic Web technologies to the broader DH domain include but not limited to harmonised views of distributed sources, semantic-based content aggregation, enrichment, search, browsing and recommendation. Over the last decades we have witnessed a proliferation of semantic web projects in the broader cultural heritage domain at a national and European level. Infrastructure programmes, such as EUROPEANA, DARIAH, PARTHENOS and ARIADNEplus, to name but a few, have delivered rich interoperable structures and innovations that advanced the tasks of data integration, sharing, analysis, retrieval, and visualisation. As conceptual models mature and expand, and CIDOC-CRM is becoming an undeniable standard in the domain, we reflect on the challenges and opportunities encountered when semantic web technologies are applied both to regional small and large, globally renowned museum collections. The role and application of semantic modelling is examined through two distinct case studies; a) the regional Archaeological Museum of Tripolis (Greece) of limited digital presence, but with a unique collection of regional antiquities that employed semantic methods to enrich and share their digitised collections holdings and b) the Sloane Lab (UK) that aims to aggregate a multitude of catalogue records (both historic and current, from multiple disciplines) dispersed across the British Museum, Natural History Museum and British Library. The presentation delivers useful insight and highlights the opportunities and challenges both for small heritage organisations and large global institutions when applying high-level semantics to withdraw silo barriers of museum items and enable interoperable and multi-layered representations

    Sloane Lab: Domain Vocabularies for Semantic Interoperability of Museum Collections

    Get PDF
    How do domain vocabularies and terminological resources contribute to semantic harmonisation and enrichment of siloed collections in digital infrastructures? What is the role of industry-standard and bespoke museum-owned authority files and terminologies in the process of ‘creating a unified virtual “national collection” by dissolving barriers between different collections and opening UK heritage to the world’ (Towards a National Collection, 2022)? The Sloane Lab aims to aggregate a multitude of catalogue records (both historic and current, from multiple disciplines) dispersed across the British Museum, Natural History Museum and British Library. The task of integrating these disparate records and facilitating interoperable access poses significant challenges. The competency of domain-oriented standardised, high-level ontologies such as CIDOC-CRM to act as a common application layer of data semantics and their capacity to enable innovative ways for cross-searching, contextual exploration and interrogation is well documented in the literature. However, their ability to provide a common conceptual layer of high-level semantics for the purposes of unification, alignment and harmonisation comes at the expense of the specialisation of terminological and typological definitions. This can hinder the discovery and interrogation of resources at a higher level of granularity and limit the opportunities for entity enrichment and linking to external definitions from the Linked Data Cloud. We discuss a method of specialisation of upper-level ontologies by adding an additional level of vocabulary semantics of thesauri, glossary, and authority files to supplement the CIDOC-CRM with specialised terms. In this process, we highlight the role and contribution of museum-based vocabulary resources towards the realisation of unified collections and the opportunities they offer for semantic enrichment, linking and interoperability

    A knowledge-based approach to information extraction for semantic interoperability in the archaeology domain

    Get PDF
    The paper presents a method for automatic semantic indexing of archaeological grey-literature reports using empirical (rule-based) Information Extraction techniques in combination with domain-specific knowledge organization systems. Performance is evaluated via the Gold Standard method. The semantic annotation system (OPTIMA) performs the tasks of Named Entity Recognition, Relation Extraction, Negation Detection and Word Sense disambiguation using hand-crafted rules and terminological resources for associating contextual abstractions with classes of the standard ontology (ISO 21127:2006) CIDOC Conceptual Reference Model (CRM) for cultural heritage and its archaeological extension, CRM-EH, together with concepts from English Heritage thesauri and glossaries.Relation Extraction performance benefits from a syntactic based definition of relation extraction patterns derived from domain oriented corpus analysis. The evaluation also shows clear benefit in the use of assistive NLP modules relating to word-sense disambiguation, negation detection and noun phrase validation, together with controlled thesaurus expansion.The semantic indexing results demonstrate the capacity of rule-based Information Extraction techniques to deliver interoperable semantic abstractions (semantic annotations) with respect to the CIDOC CRM and archaeological thesauri. Major contributions include recognition of relevant entities using shallow parsing NLP techniques driven by a complimentary use of ontological and terminological domain resources and empirical derivation of context-driven relation extraction rules for the recognition of semantic relationships from phrases of unstructured text. The semantic annotations have proven capable of supporting semantic query, document study and cross-searching via the ontology framework

    Negation detection and word sense disambiguation in digital archaeology reports for the purposes of semantic annotation

    Get PDF
    The paper presents the role and contribution of Natural Language Processing Techniques, in particular Negation Detection and Word Sense Disambiguation in the process of Semantic Annotation of Archaeological Grey Literature. Archaeological reports contain a great deal of information that conveys facts and findings in different ways. This kind of information is highly relevant to the research and analysis of archaeological evidence but at the same time can be a hindrance for the accurate indexing of documents with respect to positive assertion

    A pilot investigation of Information Extraction in the semantic annotation of archaeological reports

    Get PDF
    The paper discusses a prototype investigation of semantic annotation, a form of metadata assigning conceptual entities to textual instances; in the case of archaeological grey literature. The use of Information Extraction (IE), a Natural Language Processing (NLP) technique, is central to the annotation process while the use of Knowledge Organization System (KOS) is explored for the association of semantic annotation with both ontological and terminological references. The annotation process follows a rule-based information extraction approach using the GATE NLP toolkit, together with the CIDOC CRM ontology, its CRM-EH archaeological extension and English Heritage thesauri and glossaries. Results are reported from an initial evaluation, which suggest that these information extraction techniques can be applied to archaeological grey literature reports. Further work is discussed drawing on the evaluation and consideration of the characteristics of the archaeology domain. Copyright © 2012 Inderscience Enterprises Ltd

    Semantic Indexing via Knowledge Organization Systems: Applying the CIDOC-CRM to Archaeological Grey Literature

    Get PDF
    The volume of archaeological reports being produced since the introduction of PG161 has significantly increased, as a result of the increased volume of archaeological investigations conducted by academic and commercial archaeology. It is highly desirable to be able to search effectively within and across such reports in order to find information that promotes quality research. A potential dissemination of information via semantic technologies offers the opportunity to improve archaeological practice, not only by enabling access to information but also by changing how information is structured and the way research is conducted. This thesis presents a method for automatic semantic indexing of archaeological greyliterature reports using rule-based Information Extraction techniques in combination with domain-specific ontological and terminological resources. This semantic annotation of contextual abstractions from archaeological grey-literature is driven by Natural Language Processing (NLP) techniques which are used to identify “rich” meaningful pieces of text, thus overcoming barriers in document indexing and retrieval imposed by the use of natural language. The semantic annotation system (OPTIMA) performs the NLP tasks of Named Entity Recognition, Relation Extraction, Negation Detection and Word Sense disambiguation using hand-crafted rules and terminological resources for associating contextual abstractions with classes of the ISO Standard (ISO 21127:2006) CIDOC Conceptual Reference Model (CRM) for cultural heritage and its archaeological extension, CRM-EH, together with concepts from English Heritage thesauri and glossaries. The results demonstrate that the techniques can deliver semantic annotations of archaeological grey literature documents with respect to the domain conceptual models. Such semantic annotations have proven capable of supporting semantic query, document study and cross-searching via web based applications. The research outcomes have provided semantic annotations for the Semantic Technologies for Archaeological Resources (STAR) project, which explored the potential of semantic technologies in the integration of archaeological digital resources. The thesis represents the first discussion on the employment of CIDOC CRM and CRM-EH in semantic annotation of grey-literature documents using rule-based Information Extraction techniques driven by a supplementary exploitation of domain-specific ontological and terminological resources. It is anticipated that the methods can be generalised in the future to the broader field of Digital Humanities

    A comparison of machine learning and rule-based approaches for text mining in the archaeology domain, across three languages

    Get PDF
    Archaeology is a destructive process in which the evidence primarily becomes written documentation. As such, the archaeological domain creates huge amounts of text, from books and scholarly articles to unpublished ‘grey literature’ fieldwork reports. We are experiencing a significant increase in archaeological investigations and easy access to the information hidden in these texts is a substantial problem for the archaeological field, which has been identified as early as 2005 (Falkingham 2005). In the Netherlands alone, it is estimated that 4,000 new grey literature reports are being created each year, as well as numerous books, papers and monographs. Furthermore, as research – such as desk based assessments – are increasingly being carried out online remotely, these documents need to be made more easily Findable, Accessible, Interoperable and Reusable. Making these documents searchable and analysing them is a time consuming task when done by hand, and will often lack consistency. Text mining provides methods for disclosing information in large text collections, allowing researchers to locate (parts of) texts relevant to their research questions, as well as being able to identify patterns of past behaviour in these reports. Furthermore, it enables resources to be searched in meaningful ways using semantic interoperable vocabularies and domain ontologies to answer questions on what, where and when. The EXALT project at Leiden University is working on creating a semantic search engine for archaeology in and around the Netherlands, indexing all available, open-access texts, which includes Dutch, English and German language documents. In this context, we are systematically researching and comparing different methods for extracting information from archaeological texts, in these 3 languages. The specific task we are looking at is Named Entity Recognition (NER), which is to find and recognise certain concepts in text, e.g. artefacts, time periods, places, etc. In the archaeology domain, the task of entity recognition is particularly specialised and determined by domain semantics that pose challenges to conventional NER. We develop text mining applications tailored to the archaeological domain and in this process we will compare a rule-based knowledge driven approach (using GATE), a ‘traditional’ machine learning method (Conditional Random Fields), and a deep learning method (BERT). Previous studies have investigated different applications of text mining in archaeological literature (Richards et al. 2015), but this often occurred at a relatively small scale, in isolated case studies, or as proof-of-concept type work. With this study, we are comparing multiple methods in multiple languages, and we aim to contribute to guidelines and good practice for text mining in archaeology. Specifically, we will compare not only the overall accuracy of each approach, but also the time, digital literacy, hardware, and labelled data needed to run each method. We also pay attention to the energy usage and CO2 output of these machine learning models and the impact on climate change, something that’s particularly poignant during the ongoing energy crisis. Besides these more practical aspects, we also aim to describe some general properties of the way we write about archaeology, and how writing in a particular language can make knowledge transfer (and by extension, NER) easier or more difficult

    Automatic metadata generation in an archaeological digital library: Semantic annotation of grey literature

    Get PDF
    . This paper discusses the automatic generation of rich metadata from excavation reports from the Archaeological Data Service library of grey literature (OASIS). The work is part of the STAR project, in collaboration with English Heritage. An extension of the CIDOC CRM ontology for the archaeological domain acts as a core ontology. Rich metadata is automatically extracted from grey literature, directed by the CRM, via a three phase process of semantic enrichment employing the GATE toolkit augmented with bespoke rules and knowledge resources. The paper demonstrates the potential of combining knowledge based resources (ontologies and thesauri) in information extraction, and techniques for delivering the automatically extracted metadata as XML annotations coupled with the grey literature reports and as RDF graphs decoupled from content. Examples from two consuming applications are discussed, the Andronikos web portal which serves the annotated XML files for visual inspection and the STAR project, research demonstrator which offers unified search across of archaeological excavation data and grey literature via the core ontology CRM-EH
    corecore